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ABSTRACT 

We present a structure finding algorithm designed to identify galaxy groups 
in photometric redshift data sets: the probability friends-of-friends (pFoF) algo- 
rithm. This algorithm is derived by combining the friends-of-friends algorithm in 
the transverse direction and the photometric redshift probability densities in the 
radial dimension. The innovative characteristic of our group-finding algorithm 
is the improvement of redshift estimation via the constraints given by the trans- 
versely connected galaxies in a group, based on the assumption that all galaxies 
in a group have the same redshift. Tests using the Virgo Consortium Millennium 
Simulation mock catalogs allow us to show that the recovery rate of the pFoF 
algorithm is larger than 80% for mock groups of at least 2 x 1O 13 M , while the 
false detection rate is about 10% for pFoF groups containing at least ~ 8 net 
members. Applying the algorithm to the CNOC2 group catalogs gives results 
which are consistent with the mock catalog tests. From all these results, we 
conclude that our group-finding algorithm offers an effective yet simple way to 
identify galaxy groups in photometric redshift catalogs. 

Subject headings: galaxies: general 

1. Introduction 



Galaxy groups are sites where local galaxy number densi ty is relatively h i gher than the 
field. The majority (~ 60%) of galaxies lies in groups (e.g., lEke et al.l 12004 ; iBerlind et al. 
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20061 ; iTago et al.ll2006l ). so that galaxy groups provide an excellent location to study the ef- 
fect of local environment on galaxy formation and evolution. Unlike galaxy clusters, galaxy 
groups, especially those at high redshift, are not easy to detect because of their smaller 
size and the significantly lower hot gas density. The current published galaxy group cat- 
alogs are constructed based on large-scale galaxy redshift surveys using automated group 
finding schemes. The techniques include the popular friends-of- friends algorithm (e.g., 



Geller fc Huchralll983l; iMerchan fc Zandivarezll2005l ; lEke et al.ll2004l ) and the Voronoi parti- 
tion technique (e.g., iGerke et al.l 120051 ) . Most of these catalogs list galaxy groups either in 
the nearby Universe (z < 0.1) or over a small sky area. Galaxy groups of large sample sizes 
in intermediate and higher redshift space still remain largely unexplored. 

Up to now, most structure finding techniques use spectroscopic redshift or simulated 
catalogs, both containing accurate three-dimensional position information. With the de- 
velopment of the photometric redshift method, the approximate redshifts of all galaxies 
in a photometric multi-band survey can be obtained without the time-consuming spec- 
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tral 'training set' to comput e the photometric redshift via an empiri cal polynomial of galaxy 



colors and magnitudes (e.g.. IConnolly et al.lll995t iHsieh et al.ll2005l ). Since the redshifts are 



derived from broadband galaxy colors rather than from spectra, the photometric redshift 
method can estimate the redshift of objects which are too faint for spectroscopy. On the 
other hand, photometric redshifts have larger uncertainties by a factor of 50 — 100 than those 
measured from spectroscopy. Due to the less accurate distance information in photometric 
redshift catalogs, the main problem of structure finding is the blurring of configurations 
in redshift space, producing unrealistic or elong ated structures caused by the large photo- 
metric redshift uncertainties (IBotzler et al.ll2004l ). Even with excellent photometric redshift 
estimation [a z ~ 0.03), the structures on the small scale will still be largely smeared out. 
Furthermore, projection effects make the subtraction of foreground and background galaxy 
contamination important in analyzing structures found using photometric redshift. 

In order to overcome some of these problems, we propose here a method of finding galaxy 
groups in photometric redshift catalogs. The knowledge of galaxy photometric redshift uncer- 
tainty or probability density is required for this method. This group finding methodology is 
based on the idea of the standard friends-of-friends algorithm in the transverse direction, but 
takes into account the photometric redshift probability density to determine the friendship in 
the radial direction. We describe the photometric redshift technique and the error estimation 
for individual galaxies in $2], and present our photometric sample selection criteria in £j3l The 
group-finding parameters and the algorithm are detailed in §H and §|5j The basic properties 
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of galaxy groups are quantified in §6j This algorithm is tested in jjT] using mock ca talogs 
constructed from the Virgo Consortium Millennium Simulation (jSpringel et al.ll2005l ). and 
applied to the re al observed grou ps in the Canadian Network for Observational Cosmology 
Survey (CNOC2; lYee et al.ll2000l ) in §SJ Finally, we present a summary in §|HJ The analyses 
of galaxy group samples from a number of surveys will be presented in future papers. We 
adopt the standard cosmological parameters of H =70 km/s/Mpc, Q m = 0.3, and Q\ = 0.7. 



2. Empirical Photometric Redshift 
2.1. Training set 



We estimate photo metric redshift using the empirical fitting technique (e.g.. lConnolly et al 



19951 ; iHsieh et al.ll2005l ). We express the galaxy redshift as a quadratic polynomial in mag- 
nitudes and colors: 



+ aivrii + y j 



aij{mi 



rrii 



X 



where m ; and rrtj are the passband magnitudes and do, a% and are the constant term and 
the coefficients associated with the magnitudes and colors, respectively. The coefficients in 
equation (1) can be derived by fitting a training set, a catalog which contains both galaxy 
redshift and photometry information. 

Our training set is constructed using data from t he Red-Sequenc e Cluster Survey (RCS; 



Gladders fc Yee 



2005h in four C NOC2 survey patches JYee et alihoool ) and the GOODS/HDF 



N field flGiavalisco et all 120041 ) . 



The RCS was designed to find galaxy clusters at 0.4 < z < 1.4 using the cluster red- 
sequence method with R c and z' filters. It includes 22 widely separated patches covering 
a total area of 90 deg 2 , observed with the CFHT 12K camera in the Northern Hemisphere 
and the CTIO 4m MOSAIC II camera for the Southern Sky. The RCS follow-up covers 33.6 
deg 2 (corresponding to about 75 % of the CFHT RCS fields) obs erved with the 12k camera 
in B and V. The photometry has been carried out using PPP (lYed Il99ll ; lYee et al.l Il996l ) 
and internally calibrated using star colors and galaxy counts. It has also been cross-checked 
with star colors and counts from Sloan Digital Sky Survey (SDSS) Data Release 3 (DR3; 
Abazajian et al.ll2005l ). The RCS follow-up sample is 100% complete to R a ~ 24.2 . Further 



details on the data and on the photometric reduction can be found in IHsieh et al.l (120051 ). 
The CNOC2 survey covers over 1.5 deg 2 of sky with a total sample of ~ 6200 galaxies (up 
to z ~ 0.55) with R c < 22. Or; 1727 of these galaxies overlap with the RCS sample. 



The GOODS HDF-N field allowed us to extend our training set sample to larger red- 
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shifts. The GOODS is a survey based on multi-band imaging data obtained with the Ad- 
vanced Camera for Surveys (ACS) on the Hubble Space Telescope (HST). It covers two 
fields, HDF-N and CDF-S, with a total area of about 320 arcmin 2 , a 5a limiting magnitude 
in the R passband (on the AB system) of 26.6 , and a redshift ran ge from 0.5 to 1.5. We have 
used publicly available BVRz' photo metry (ICapak et al.l 120041 ) and spectroscopic redshift 



flWirth et all 120041 ; ICowie et al. 



2004|) for 2661 galaxies in the HDF-N field. To ma t ch th e 



RCS zero point, the GOODS magnitudes have been corrected following IHsieh et al.l (120051 ). 
As a whole, our training set contains 3,988 galaxies observed in BVR c z' up to z ~ 1.4. 
The photometry uncertainties in each passband are AB ~ 0.04, AV ~ 0.04, AR C ~ 0.02, 
and Az' ^ .04. Further details on the properties of this training sample can be found in 



Hsieh et al.l (120051 ). 



2.2. Photometric Redshift Estimation and Associated Error 

To minimize the dispersion between photometric and spectroscopic redshifts, we sepa- 
rate the training set galaxies into 19 color-magnitude cells in the observed frame (Fig. [Q) to 
differentiate roughly different types of galaxies and different redshifts, because galaxies at 
high redshift tend to be fainter and redder. To create these cells, we first sort the training set 
galaxies by magnitude and color, so that each cell is created starting from the region where 
bright and red galaxies are on the observed color-magnitude diagram. We use slopes of -0.084 
and -0.60 for the two sets of parallel lines to create the cells. The slope of -0.084 is chosen 
based on the red sequence slope at z ~ 0.4 in B — R c , and the other slope is determined 
according to the galaxy distribution for different redshift bins on the color-magnitude dia- 
gram. We let each cell grow by A(B — R c ) = 0.1 and AR C = 0.1 in each step until it contains 
at least 160 training set galaxies. Galaxies are distributed into the cells according to their 
colors and magnitudes. The coefficients of Equation [1] are obtained by a linear regression 
method in each color-magnitude cell using the training set galaxies. These coefficients are 
then applied to those galaxies in the same color-magnitude cell to estimate their redshifts. 



We adopt the method in IHsieh et al.l (120051 ) to estimate photometric redshift uncer 



tainties. To estimate the photometric redshift uncertainties due to fitting, we bootstrap 
the training set galaxies in each color-magnitude cell 300 times with the assumption of per- 
fect photometry for each galaxy. On the other hand, to evaluate the contribution from 
photometric uncertainties, we use a Monte-Carlo method to simulate galaxy magnitudes in 
each passband for 300 draws with Gaussian photometry uncertainties assumed. With these 
300 x 300 realizations, we build the photometric redshift probability density of each galaxy 
and take the r.m.s. dispersion as the photometric redshift uncertainty for the galaxy. The 
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photometric redshift of each galaxy is the median value of these 300 x 300 executions. 

To investigate how well the empirical photometric redshift uncertainties resemble the 
true ones, we define the empirical photometric redshift uncertainty o emv as the median 
empirical photometric redshift uncertainties of the training set galaxies in a color-magnitude 
cell. We compute the dispersion between photometric and spectroscopic redshifts in the 
same cell and take it as the true uncertainty, denoted as (J true- We find that there is a linear 
correlation between a emp and a true , but not of unity slope. Therefore, we scale the empirical 
photometric redshift uncertainties and the probability densities by a factor of cr true /a emp in 
each co lor- magnitude cell. 

We drop one of every ten galaxies in our training set (398 galaxies in total) and we 
estimate the redshift of these galaxies using the remaining training set galaxies (3590 galaxies 
in total), so that these two sets are independent, ensuring an unbiased estimation of the 
performance of our photometric redshift technique. The comparison of photometric redshift 
and spectroscopic redshift for this subset is illustrated in Fig. [2j The dispersion of Az = 
z phot — z sp ec is ~ 0.060 for these 398 galaxies using BVR c z' photometry for < z spec < 1. 
The photometric redshift uncertainties computed using the technique described above are 
shown in Fig. [3] as functions of galaxy magnitude and color. We note that the computed 
photometric redshift uncertainties increase for fainter and bluer galaxies. We also apply the 
solutions to all the galaxies in the training set and find that redshift uncertainties increase 
for galaxies at higher redshift, with Az ~ 0.060 and ~ 0.134, for galaxies at 0.3 < z < 0.6 
and 0.6 < z < 0.9, respectively. 



3. The Completeness Weight 

Even though the photometric redshift technique can be used to estimate a redshift for a 
large number of galaxies economically, the method may fail for extremely faint galaxies and 
galaxies with unreliable redshift. Thus, these galaxies should be excluded from the sample. 
Galaxy counts must be corrected to account for such rejections. The selection of galaxies in 
a photometric redshift catalog can be based on (1) photometric redshift ranges which allow 
the 4000A break to be within one of the pass bands, and (2) the total probability within a 
desired redshift range to ensure the quality of photometric redshift measurement. 

We set the redshift range to be 0.02 < z < 1.4, where the upper photometric redshift 
limit is due to the passband wavelength coverage for the 4000A break in our training set. We 
also select galaxies whose total probability within 3a Zcut of its central photometric redshift 
is greater than 99.7%, where a Zcut is set as a Zcut =0.2(1 + 2;). 
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As we select whether an object is in the sample or not, a completeness correction weight 
Wi is assigned to each galaxy. Since we find that both red and blue galaxies have similar 
completeness correction weights, the completeness factor is estimated using the ratio of the 
total galaxy number within Am^ c = 0.1 magnitude bin to the total galaxy number satisfying 
our selection in the same magnitude bin. In general, this completeness weight becomes larger 
for fainter galaxies. Therefore, we set a nominal apparent magnitude cutoff based on where 
Wi = 2 to avoid galaxies of high weights, if this apparent magnitude cutoff is brighter than 
the limiting magnitude of the sample. 



4. Parameters for the Friendship 

We develop a group-finding algorithm using photometric redshift. We follow the idea of 
the well-known friends-of-friends algorithm in angular separation; however, we consider the 
conditional photometric redshift probability in the redshift direction. 



4.1. The 2D Linking Length, DO 



The standard friends-of-friends algorithm (FoF; iGeller fc Huchralll983l ) identifies over- 
dense regions by looking for galaxies closer to one another than a given cutoff separation. A 
group forms from a seed galaxy. Galaxies satisfying the linking criterion to this seed galaxy 
are linked together. A galaxy group is defined by the chains of such finding procedures using 
every linked galaxy as a new seed. We adopt this linking idea in our algorithm to search 
for group members in the transverse direction. Given a fixed 2D reference linking length 
D0 xy at z—0, the linking length used to unite galaxies should be scaled as D0 xy / (1+z) for the 
sake of forming groups of similar over-density. However, in an apparent-magnitude limited 
survey, criteria based on the distance between galaxies have to consider the variation of the 



mean galaxy separation with redshift (jMarinoni et al.l 120021 ; lEke et al.ll2004l ). The apparent 



magnitude cutoff of a survey causes sparser galaxy number density at higher redshift. In 
order to form galaxy groups of similar over-density regions throughout the survey, the linking 
length should take into account the varying absolute magnitude cutoffs at different redshifts. 
We take the standard Schechter luminosity function, 0(M^ c ), with a luminosity evolution 



blue galaxies ( 


Lin et al. 


1999) 


(Kodama & Arimoto 


1997 


)• - 



-21.41 and the faint end slope a = —1.20 



-DO oc RU 2 , 
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where 

In Equation [21 M cu4 is the desired absolute magnitude depth and Mi im is the absolute 
magnitude limit corresponding to the apparent magnitude limit of the survey. This scaling 
factor R w is unity if M cut < Mu m . 



We increase the linking length by y N Wl to conserve the local galaxy number density 
due to the removal of unreliable galaxies, where N is the total number of galaxies joined into 
a group and Wi is the completeness weight (described in § Ej) of each linked galaxy. 

In practice, our linking length used to search for connected galaxies in the transverse 
direction is expressed as: 

DO = J^Ef^D0 X y (3) 
V N l + z v 1 



4.2. The Redshift Friendship Criterion, P ra tio,crit 

In the ideal situation where there is no uncertainty in the redshift, the occurrence of a 
galaxy or group at its redshift is a 5 function. From a statistical viewpoint, the occurrence 
of an event in photometric redshift space for each galaxy is independent in the sense that 
the photometric redshift of each galaxy is estimated by applying a set of solutions from an 
empirical method. Given that galaxy A, galaxy B, ... , and galaxy n with photometric 
redshift probability density Pa(z), Pb(z), ... , and P n {z) form a group in redshift, the group 
redshift density is the likelihood for all these n members to occur at the same redshift: 

Pgroup(z) = P A (z)P B (z)...P n (z). 

Therefore, the main idea of our group-finding algorithm is to narrow down the photometric 
redshift uncertainty of a group by way of joining new galaxy members, because the group 
redshift is where all members in the same group may occur. 

Whether a galaxy is in the same redshift space as another galaxy is determined by the 
overlapping probability based on their photometric redshift probability densities. We use a 
probability ratio, P ra tio, as the criterion to set the membership in redshift. The P ra tio for 
galaxy % with respect to the group redshift density is defined as 



/oo 

ratio 



Jq Pi(z) Pgroupi,z)dz 

maxP 
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The numerator is the total probability density for galaxies to occur at the same redshift. The 
denominator is the maximum value of the numerator, which occurs when all the galaxies are 
at the same redshift. To clarify the P ra uo concept we assume two galaxies with Gaussian 
photometric redshift probability density P Zl;(T1 (z) and P z . 2t0 . 2 (z), where Z\ and z 2 are the 
photometric redshifts for these two galaxies and a\ and a 2 are the uncertainties. The total 
probability for the galaxies to occur at the same redshift is: 



The maximum total probability, maxP, occurs when z\ = z 2 . 

We limit these two galaxies so that they must have z\ and z 2 separated by less than 
o~i + a 2 . Based on this qualification of friendship, the extreme case is when z 2 is o\ + o 2 
apart from Z\. It is worth noting that the total probability is immaterial as the friendship 
guideline, for the reason that this total probability depends on the standard deviations 
of the two photometric redshift probability density functions. We find that P ra uo ranges 
from ~ 0.37 for two Gaussian probability densities of 0\ = a 2 and \z 2 — Zi\ = o"i + a 2 , to 
~ 0.50 when one of the a is small relative to the other. We set a criterion, P ra u ,crit, as the 
friendship criterion in redshift. For any galaxies to be joined together, they must have their 

Pratio ^ Pratio, crit- 



The algorithm starts with a seed galaxy, and treats every galaxy in the sample as a 
seed. Steps to form a group are as follows. 

Step 1: The seed galaxy 

• A 2D linking length is calculated based on this seed galaxy's photometric redshift and 
completeness weight (Equation [3]). 

• Galaxies within this length from the seed are searched in the transverse direction. 

• Among those galaxies encircled by the linking length, a galaxy is chosen as the seed's 
companion which have the maximum P ra uo relative to the seed galaxy and satisfies the 

Condition P ra tio > Pratio,crit- 

Step 2: The proto-group 

• The seed and its companion form a proto-group. 

• Calculate the photometric redshift probability density P gr oup{z) for the proto-group. 



P 




5. The Probability Friends-of-Friends Algorithm 



5.1. The Algorithm 
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• Assign the peak of P gr0 up(z) as the redshift of this proto-group. 

• Re-calculate the linking length and absolute magnitudes for these two galaxies based 
on the assigned proto-group redshift. 

• Re- verify the membership of the companion by checking that: 

(1) the companion is still enclosed by the updated linking length; 

(2) the companion still has the maximum P ra tio to the seed; and 

(3) the revised absolute magnitudes for both the seed and the companion still satisfy 
the sample depth M^ cut criterion. 

• A proto-group is confirmed if the membership is verified. 

Step 3: The primary group 

• Examine other galaxies located within the linking length to the seed galaxy using the 
redshift and linking length based on the proto-group. 

• From these remaining galaxies, a new member is chosen which satisfies the P ra tio > 
P ra tio,crit condition, and also has the highest P ra tio to the group photometric redshift proba- 
bility density. 

• Re-calculate the group photometric redshift probability density and the linking length 
with the new member included. 

• Re-compute the absolute magnitude of each linked galaxies using the updated group 
redshift. 

• Re-check the membership of all connected galaxies by the -DO and M^ cut criteria. 

• Repeat the procedure until all the galaxies enclosed by the seed galaxy's linking length 
have been examined. 

• A primary group is formed. 

Step 4: The friends-of-friends 

• A new member is selected using a procedure similar to Step 1 in choosing the com- 
panion, but applied to galaxies within the linking length of any members in this primary 
group. 

• Repeat the process for all members of the primary group until there are no more 
additional galaxies linked or rejected. 

• A 'mini-group' is formed. The prefix 'mini-' refers to the group associated with each 
seed galaxy. 

Step 5: The mini-groups 

• Steps 1 to 4 are carried out for all galaxies. Since each galaxy in our sample is 
considered as a seed galaxy, each galaxy has its own mini-group. 

Step 6: Unifying mini-groups 

The procedure of unifying mini-groups is necessary since a galaxy may be a member of 
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many mini-groups. The unifying principles are similar to those used to form mini-groups; 
that is, mini-groups must have some common members and satisfy a P ra tio,crit threshold in 
order for them to merge into a more massive group. Terminologically, we refer to the mini- 
group formed using seed galaxy X as 'mini-group X\ We detail the procedures below, with 
mini-group A having N-l other members Xj, where % — 1 to TV — 1. 

• If the photometric redshift probability density of mini-group X-i satisfies the P ra tio,crit 
criterion with respect to that of mini-group A, all members of mini-group are added to 
the member list of mini-group A; otherwise, galaxy will be removed from the member 
list of mini-group A. The addition and removal of galaxies from mini-group A takes place 
only after all mini-groups Xj have been checked. 

• Since the process of merging or removal will affect the redshift probability density 
of mini-group A and hence may fragment the mini-group, the following criteria must all 
be satisfied individually for a surviving member and its mini-group members after the 
merging process above: 

(1) the member satisfies the P ra tio,crit to the updated mini-group A probability density; 

(2) the member has at least one member of mini-group A within the linking length; and 

(3) the member is still brighter than M^ cut at the updated group redshift. 

In some circumstances, an original member of a mini-group may have already been 
flagged as belonging to other merged group(s). For instance, the member list of mini-group 
A is mini-group A = {A, n2, n3, n4, ... , n8, n9}, where n2, n3, n4 also belong to 'grp#l', 
while n6 and n7 are members of 'grp#2'. The subsequent classification of mini-group A's 
members may belong to one of the following cases: 

(1) If all mini-group A's members have their P ra tio satisfying the P ra tio,crit criterion to 
all overlapping groups (i.e., 'grp#l' and £ grp#2'), the member lists of mini-group A and the 
overlapping groups are merged together and all these groups share the same group ID. In 
other words, mini-group A has the role as being a 'bridge' in connecting these overlapping 
merged groups. 

(2) If some of the mini-group A's members have P ra uo > Pratio,crit to an overlapping 
group (e.g., 'grpT^l') and some other mini-group A's members satisfy the P ra tio,crit criterion 
to another overlapping group (e.g., 'grp#2'), the member list of mini-group A is delisted 
and all its members are classified into these overlapping groups. For the situation that some 
of the mini-group A's members satisfy the P ra tio,crit criterion to more than one overlapping 
groups, these members are classified into the overlapping group of the best P ra tio- 

• After every mini-group has been examined, a final group catalog is established. 
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5.2. 



Discussion 



We name our group-finding algorithm 'Probability Friends- of -Friends' (pFoF) for its two 
main characteristics of group redshift probability density and the FoF 2D linking. The main 
feature in our group finding procedure is the dynamic linking. The group redshift probability 
density and the linking length keep being refined through the entire process and are used to 
re-check all connected galaxies in this group. Some interesting points are: 

(a) the normalized group redshift probability density is reconstructed every time when 
a galaxy joins to or is rejected from this group as 



(b) The use of the above group redshift probability density in calculating P ra ti for a 
new galaxy can be interpreted as the probability for this new galaxy to be in this group, 
given N members at the same redshift. 

(c) The absolute magnitude of the connected members are re-computed, and the mem- 
bers are re-checked using the updated linking length every time when any galaxy is connected 
or rejected. 

(d) Algorithmically, a single galaxy is considered as a group as well. In subsequent anal- 
ysis, we set a minimum of five galaxies in a group to exclude groups with too few galaxies, 
so that group redshift can be well confined by its members. 

One different approach in applying this 'photometric redshift probability density' idea 
in group finding, in place of mini-groups and the unifying procedure, is to continue Step 4 
until no more new members are linked. However, we find that this alternative group finding 
procedure may break a massive group (usually, a galaxy cluster) into several pieces in redshift 
space, especially in the region where the galaxy number density is extremely high, such as 
the core of a cluster. This happens because the formation of a massive galaxy aggregation 
has confined the group redshift to be in a narrow redshift space, and gives no flexibility for 
other galaxies of sufficiently different photometric redshifts to join in. These 'other galaxies' 
are usually the outliers in the comparison of the photometric and spectroscopic redshifts 
for individual galaxies. The idea of unifying mini-groups reduces the degree of the splitting 
of massive galaxy aggregations, but this still cannot be absolutely avoided unless higher 
accuracy photometric redshift measurements are available. 

In carrying out the group finding, we sort the sample galaxies by their peak values of 
the photometric redshift probability densities. The role of galaxy orders mainly lies in the 
steps of unifying 'mini-groups', where the existing 'mini-groups' (or merged ones) are used 
to combine with more 'mini-groups' with lower ranks. Using mock catalogs (see §7|), we have 
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tested the effect of the ordering of the seed galaxies and found that it has negligible influence 
on the results. We still decide to sort our catalogs by the peak values of the photometric 
redshift probability, so that each final group grows from the 'mini-groups' of seed galaxies 
with the best quality. 



6. The pFoF Group Properties 

6.1. Group richness 

We use N gz to denote the number of linked galaxies. The group richness, N ga i, is 
indicated by the total completeness weight Wi for galaxies in the group with background 
galaxy counts subtracted: 

i 

where A grp is the group area and ^ grp ,b g is the contaminating background galaxy surface 
density within the group. These two quantities are estimated from the data as described 
in the following two subsections. In other words, N ga i is the net number of members in a 
pFoF group. We select pFoF groups which contain at least five physically linked galaxies 
(i.e., N gz > 5) so that the group redshift can be well restricted by the members. 



6.2. Background galaxy density in galaxy groups 

The background galaxy surface density is estimated from the complete photometric 
redshift catalogs; in our case, the RCS1 CFHT patches (Hsieh et al. 2005). We apply the 
same cutoffs both in magnitude and photometric redshift as our galaxy sample selection. The 
completeness weight for each galaxy is considered as well. We then calculate the number of 
background galaxies per Mpc 2 in photometric redshift bins of 0.01, and express it as E^z). 
This T^bglz) has taken the scaling factor R w (Equation [2]) into consideration. 

The pFoF algorithm allows us to constrain group redshift within Az grp < 0.02 although 
photometric redshift uncertainties of member galaxies can be as large as a emp ~ 0.070. 
Therefore, to estimate the background galaxy contamination within a galaxy group, we 
should consider the photometric redshift space within which all members of a group may 
occur, i.e., the likelihood. Accordingly, to form the likelihood, we sum the photometric 
redshift probability densities of all members and normalize the peak of this summed photo- 
metric redshift distribution to unity, denoted as L(z). The background galaxy density for 
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this group is estimated using this photometric redshift likelihood as 

/•oo 

Zgrp,bg = / L(z)Y> bg (z)dz. (4) 

Jo 

The L(z) has broader wings and wider width than P gro up(z)- The estimation of E gr p,&g is 
underestimated if P gTcm p{z) is used instead in EquationlU This is because the L(z) represents 
the redshift that a galaxy in a group could have if we drew it from that group. 



6.3. Projected group area 

Geometrically, the mean separation among TV galaxies uniformly distributed over an 
area A is 

< S >= y/A/N. 

If we assign each galaxy a circular area of radius r, we should expect the total area of these 
circles centered at individual galaxies to be the same as the total area A, i.e.: 



Nn{k <s>) 2 = A 

where r = k < s >. Consequently, 



k = —^—^A/N = -L. 

< S > \/7T 



We calculate the projected group area using an empirical method. Each member in a group 



is assigned a radius r =< s > /y/n, where < s > is computed as l/^/E flrPi 6 S . We then draw 
a rectangular box of the area A reg with the length and width enclosing the R.A. and Dec. 
range of the circles centered at each group member. iV random uniformly distributed points 
are casted over this rectangular box. By counting the numbers (N in ) of these N points 
within the distance r to any group member, the projected group area is computed as 

Consequently, the estimated background galaxy number in a pFoF group is calculated as 
N' bg = A'^pT^grp^g. However, since galaxies are not distributed uniformly, this background 
estimation must be considered as a lower limit. Tests performed on mock catalogs allow 
us to cross-check the true and computed contaminating background galaxy counts within a 
galaxy group. From these tests, we find that the computed background galaxy counts in a 
pFoF group are correlated with the true number of contaminating galaxies, but not with a 
unity slope (see §7.2.1). Hence, equivalently, we can apply an empirical correction to the 
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projected group area to obtain an effective area, so that the background galaxy counts are 
properly estimated: 

N in 2.505 
Ag r p — 1.634— —Afeg — — , (5) 

9 r P,bg 

based on the results from simulated catalogs. We note that the empirical corrections are 
similar (within 10%) for a variety of linking criteria and sample selections. 



7. Testing pFoF Algorithm on Mock Catalogs 
7.1. Mock Catalogs 

To assess the quality of the pFoF algorithm, we perform tests using mock catalogs which 



have been obtained by the Virgo Consortium Millenni um Simulation ( Springel et al.ll2005l ) 



using semi-analytical modeling of galaxy evolution by ICroton et al.l (120061 ) . Groups in the 



simulation are iden tified by a FoF gro up-finder with a linking length of 0.2 of the mean 



particle separation (ICroton et al.ll2006l ). We prune off those FoF halos which contain only 



one or two galaxies, and define galaxies in these poor FoF halos as field galaxies. 

Our mock catalogs contain ~ 800, 000 galaxies in BRI magnitudes with Rab < 26.0 
with redshifts extending from to 1.4 in a total of 5.0 square degrees of sky area from five 
cones. For the purpose of testing our algorithm, we convert the photometry in the mock 
catalogs to the Vega system, and set a cutoff as m Rc < 22.5 to mimic a flux limited sample. 
With this apparent magnitude cutoff, the sample becomes incomplete at M^ cut = +2.0 
at z cu t = 0.412. To simulate photometric redshift for the total of 177,344 galaxies in our 
mock sample, we take the following steps. 

• We construct photometric-redshift functions using our training set galaxies in each 
spectroscopic redshift bin with size of 0.05. The histogram of the computed photometric 
redshifts of these galaxies in each bin is normalized to have an area of unity, which forms 
the photometric redshift distribution function for that redshift bin. 

• The photometric-redshift distribution functions are then used to draw a photometric 
redshift for each galaxy in the mock sample in the corresponding redshift bin, so that any 
offset between photometric and spectroscopic redshifts in the real observational samples can 
be mimicked. The use of the photometric-redshift distribution function derived from the 
actual sample also ensures that the dispersion between the simulated photometric and true 
redshifts increases toward higher redshifts. 

• Each galaxy in the mock sample is then tagged with a photometric redshift probability 
density centered at its simulated photometric redshift. The tagged photometric redshift 
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probability density is based on that associated with a training set galaxy of similar color 
and magnitude. This enables us to obtain reasonable dependence of photometric redshift 
probability density on galaxy color and magnitude, so that the distributions of photometric 
redshift uncertainties for galaxies in the mock sample are similar to those of our training set 
galaxies. 

The dispersion between the simulated photometric redshift and actual galaxy redshift for 
galaxies in the mock sample is ~ 0.061 at 0.3 < z mock < 0.6, and ~ 0.122 at 0.6 < z mock < 0.9 
(compared with 0.060 and 0.134 in our real data set). 

After the simulated photometric redshifts are obtained, we carry out the sample selection 
criteria for those galaxies in the mock sample. The completeness factor Wi is computed 
and assigned to each galaxy satisfying the selection. We find that W{ ~ 1.29 at m# c = 
22.5. We also select galaxies in the mock catalogs brighter than + 2 after applying 
approximate k- and evolution corrections. A total of 72,954 galaxies are in our final selected 
mock sample, and the median Wi is ~ 1.09. We refer to this simulated photometric redshift 
sample resembling our real data as the l z simu i ate d sample. 

7.2. Test Results 

We apply our pFoF group-finding algorithm to the mock photometric redshift sample 
with fiducial parameters P ra tio,crit = 0.37 and D0 xy = 0.25 Mpc. We use the mock photo- 
metric redshift sample itself as the control field for background subtraction. 

7.2.1. contaminating background galaxies 

Background galaxy contamination correction is essential for any work using photometric 
redshifts. The photometric redshift technique can be an effective tool in scientific analysis, if 
the estimated and true background galaxy contamination are comparable to each other. For 
each pFoF group, we estimate the numbers of background galaxies as N' h = A' grp T, grp ^ bg as 
described in §EJ In the use of mock catalogs, we can count the actual contaminating galaxies; 
i.e., N bg ^ actuah galaxies contributed by the field, or other halos, or both. By comparing N' b 
and Nbg tac t ua i in each true pFoF group, we find that N' b tends to be underestimated when 
Nb g ,actuai is large and the trend can be approximated using a linear relation as N bg)actua i = 
1.634 x Nl g — 2.505. We therefore apply the linear relation to correct N' bg by adjusting the 
group area A' grp (Equation [5]). We use N bg to denote the number of the estimated background 
galaxies with the linear correction applied. 
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7.2.2. Test 1: the recovery rate 

To test the performance of our pFoF algorithm, we first investigate the group recovery 
rate of the mock sample. We apply our pFoF group-finding algorithm to the mock z simu i ate( i 
sample with P ra tio,crit = 0.37 and D0 xy = 0.25 Mpc. The mock groups which have at 
least three members brighter than our sample magnitude cutoffs (i.e., vn^ c < 22.5 and 
M^ e < MJj c + 2.0) are selected as the reference groups, with a total number of 705 at 

& *^ %cut- 

We use the following matching procedure. Since every galaxy has a pFoF group ID in 
the output files of the pFoF algorithm, we classify each member of a mock reference group 
by its pFoF group ID. The members of a given mock group may belong to different pFoF 
groups. We define the pFoF group which matches the mock group as the one that contains 
the largest number of members of the mock group and also satisfies N ga i > 3 and N gz > 5. 
Each pFoF group is allowed to match with only one reference mock group. If there is more 
than one reference mock group recovered by the same pFoF group, only one of these reference 
mock groups is classified as 'recovered'. 

The results of the recovery test are presented in Fig. HI The Y-axis in Fig. H] is the 
fraction of the recovered to the total reference mock groups of halo mass greater than a 
cutoff (i.e., the X-axis). The recovery rate increases when the halo mass is larger. The pFoF 
algorithm recovers more than 80% of the reference mock groups of halo mass greater than 
~ 1.2 x 1O 13 M , and recovers all mock groups of halo mass greater than ~ 3.4 x 10 13 M Q . 
The total number of reference mock groups with mass larger than the two above mentioned 
limits are 147 and 41, respectively. The r.m.s. dispersion in redshift between the recovered 
reference mock groups and the matched pFoF groups is ~ 0.044, and it is improved to 
~ 0.038 for groups with halo mass greater than ~ 3.4 x 1O 13 M . 

7.2.3. Test 2: the fractions of false detections and serious projections 

To investigate the fraction of false pFoF groups, we examine every member of a pFoF 
group to see in which mock halos they are located. With P ra tio,crit = 0.37 and D0 xy = 0.25 
Mpc, we have a total of 1,019 pFoF groups as the reference, selected with N ga i > 3, N gz > 5, 
and z P FoF < z cut . A pFoF group is flagged as 'false detection' if either: 

(1) all its members are composed of field galaxies (i.e., galaxies in poor mock halos 
containing fewer than three galaxies), or 

(2) it contains fewer than three members from the mock group with the largest matched 
members. 
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We present the results in the top panel on Fig. [5j The Y-axis is the fractions of false 
pFoF groups (over the total) with N ga [ greater than a cutoff (the X-axis). The fraction of 
false groups decreases with increasing N ga i. The false detection rate is 30% for pFoF groups 
of N gal > 5.85, and is 10% when N gal > 7.91. There are 222 and 79 pFoF groups of N gal 
greater than these two richness cutoffs, respectively. We note that a pFoF group of N ga i ~ 8 
corresponds to a halo mass ~ 3.7 x 1O 13 M . We find the fraction of false groups increases 
toward higher redshift. In these tests, all the false pFoF groups with N ga i > 8 are located at 
z > 0.34. 

A pFoF group may contain multiple mock groups if an inappropriate P ra tio or D0 xy 
is used. To examine the fraction of such pFoF groups, we flag a pFoF group as 'serious 
projection' if two or more mock groups contribute similar numbers of galaxies to the pFoF 
membership. Using Ni and N 2 to denote the numbers of galaxies in a pFoF group from 
mock group #1 and #2 and N% > N 2 , this pFoF group will be flagged as 'serious projection' 
if N1/N2 < 1.5. The results are presented in the bottom panel in Fig. [5], where the Y-axis is 
the fractions of 'serious projection' to the total pFoF groups with N ga i greater than a cutoff 
in the X-axis. The fractions of 'serious projection' is about 5% for all iV ga ;10 cutoffs below 
~10. 



7.2.4- Test 3: the effect of magnitude limit 

To test how sample depth affects the pFoF performance, we repeat Test 1 and Test 2 but 
with two additional different M R '* cut cutoffs: M^ cut = M* Rc + 1.0 and M^ cut = M Rc + 1.5. 
The results are listed in Tabled] and overplotted in Figures H] and as the dashed and dotted 
curves. 

The number of recovered mock groups increases with increasing sample depth, but the 
fraction of false groups increases as well when M R ^ cut changes from M Rc + 1.0 to M Rc + 2.0. 
We therefore conclude that samples with shallower depths miss a larger portion of true 
groups, especially the poorer ones; going deeper into the luminosity function increases the 
identification of true galaxy groups with a higher, but still acceptable, false detection rate. 
Based on these tests of different M R ^ cut cutoffs, we suggest that a sample should have a 
depth of at least M R + 1.5 in order to obtain better group finding results. 
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7.2.5. Test 4 ■' the linking criteria 

One of the critical issues in any group-finding algorithm based on the friends-of-friends 
algorithm is the choice in the values of the linking parameters. To probe how the linking 
criteria affect pFoF membership, we repeat Tests 1 and 2 by changing the values of P r atio,crit 
and D0 xy . The results are listed in Tabled] and presented in Figures El and0 

The tests of different linking criteria show that there is a dynamic relation between 
Pratio,crit and D0 xy . The use of larger linking lengths, while providing a better recovery rate, 
tends to form more groups which are not truly physically related. This higher recovery rate 
and larger fractions of false detection and 'serious projection' groups are also applicable to 
tests using smaller P ra tio,crit- Therefore, a set of P ra tio,crit and D0 xy should be chosen which 
is a compromise between the recovery and false detection rates. We adopt P ra tio,crit = 0.37 
and D0 xy =0.25 Mpc for further tests of our algorithm. 

7.2.6. Test 5: Gaussian probability densities 

We check the performance of the pFoF algorithm under the assumption of Gaussian 
photometric redshift probability densities. To do this, we take each galaxy's photometric 
redshift and error as the mean and standard deviation to generate a Gaussian photometric 
redshift probability. We call these catalogs 'Gaussian', and name 'non-Gaussian' for the 
sample based on real photometric redshift probability densities (i.e., the 'z simu i ated ' sample). 
The completeness correction weight u>, is also calculated for the 'Gaussian' sample. The u>, 
is ~ 1.07 at mn c =22.5, and the averaged Wi is ~ 1.03. The estimated background counts 
are re-computed using the 'Gaussian' sample, which are similar to those estimated using the 
'non-Gaussian' sample. 

The results of this test are illustrated as the dashed curves in Fig. Compared with 
the Test 1 results of using the i z s i mu \ated sample (the solid curves), the 'Gaussian' sample 
recovers slightly more mock groups of halo mass less than 1.3 x 1O 13 M , but it fails to recover 
as many mock groups of halo mass 1.3 — 5.0 x 1O 13 M as using the z s i mu \ a t e( i sample. The 
'Gaussian' sample has a smaller fraction of false pFoF groups, but a significantly larger frac- 
tion (~ 13%) of the pFoF groups are flagged as 'serious projection'. Gaussian photometric 
redshift probability density is the simplest assumption in dealing with photometric redshift 
uncertainties in group finding. The results of Fig. [9] using 'Gaussian' and 'non-Gaussian' 
{'Zsimuiated) samples suggest that the asymmetric shape of galaxy's photometric redshift 
probability density has a role in determining group membership. 
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7.2.7. Test 6: the uncertainties of photometric redshift measurement 

To explore the performance of the pFoF algorithm as a function of photometric redshift 
measuring uncertainty, we re-construct the simulated photometric redshift sample, and then 
repeat Tests 1 and 2. We take the same steps as in § 17. II in generating photometric redshifts 
and probability densities, but reduce the dispersion between the simulated photometric red- 
shift and mock galaxy redshift by a factor of 0.5. The probability densities are consequently 
rescaled by the same factor. The overall dispersion between the simulated photometric red- 
shift and actual redshift is ~ 0.037 at < z moc k < 0.6 and ~ 0.069 at 0.6 < z moc k < 0.9. We 
apply the same criteria in selecting the sample, and refer to this sample as L Zh a if' '■ 

The test results using 'zhai/ are presented in Fig. [10] as the dash-dotted curves. The 
'zhaif 1 sample recovers 4% fewer mock groups of 2 — 4 x 10 13 M Q halo mass than the 'z simu i ate d 
(solid curve) . However, the l Zh a ii sample contains a much smaller fraction of false pFoF 
groups - reduced by a factor of ~ 3 for N gal > 6, and equal to zero for N gai > 8. Similarly, 
the fraction of serious projection is about 2.6%, which is about half the rate of the '2 S j mu ; ated ' 
sample. This test shows that the recovery rate is not a strong function of photometric 
redshift uncertainty, but the false detection and serious projection rates are. 



7.2.8. Test 7: the use of accurate redshifts 

To examine how photometric redshift accuracy affects the pFoF algorithm, we repeat 
Test 1 and Test 2 assigning to each galaxy its real redshift instead of the photometric one. 
We call this sample 'z-mimic'. The photometric redshift probability densities for galaxies in 
the 'z-mimic' catalogs are created in the same way as the l z simulated sample described in § 

o 

To test how the uncertainty in photometric redshift affects the pFoF results, we also 
re-construct the 'z-mimic' catalogs but scale the widths of the probability densities to be 
half as large (i.e., by a factor of 0.5), and refer to these as 'z/i a z/-mimic' catalogs. 

The test results using the 'z-mimic' and 'z/j a z/-mimic' samples are presented in FigJTOl 
as the dotted and dashed curves. Both the 'z-mimic' and 'z/j a z/-mimic' samples have better 
recovery rate (> 80% for 7 x 10 12 M o ) than the l z simu i ate d sample (> 80% for 1.2 x 1O 13 M ). 
The Test 2 results using 'z-mimic' and l Zh a if- mimic' samples show that the false detection 
rates are ~ 10% for pFoF groups of N ga[ > 8 and N ga i > 5.88, respectively. The 'serious 
projection' fraction is ~ 3% on average for both samples. The performance of the pFoF 
algorithm strongly relies on the accuracy of photometric redshift measurements, as well as 
on the photometric redshift uncertainties of the individual galaxies (i.e., the width of the 
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photometric redshift probability density). 

7.3. Effects of Galaxy Colors and Contamination of False Groups 

As shown in Fig. [3J the photometric redshift uncertainties are larger for blue galaxies 
(B — R c < 1.8) than red galaxies (B — R c > 1.8) by a factor of ~1.5 on average. The 
different photometric redshift uncertainties for blue and red galaxies may result in biases in 
identifying galaxy groups. 

To determine the significance of this effect, we test the pFoF algorithm using a 'blue- 
improved' sample, in which we artificially make the simulated photometric redshift uncer- 
tainties for blue galaxies to be comparable to those of the red galaxies. We find that the 
recovery rate is slightly better than that using the 'z S i mu i a ted sample by 2% for groups of 
halo mass less than ~ 3.4 x 1O 13 M , but the fraction of false groups is ~ 15% smaller 
than the l z S i mu i ate d sample. We note that the Test 6 results have shown a similar small 
improvement in the recovery rate but a significant reduction in false detection rate when all 
the photometric redshift uncertainties become smaller. Accordingly, we conclude that the 
larger photometric redshift uncertainties in a subset of galaxies do not affect the recovery 
rate, but increase the false detection rate significantly. This is because we use photometric 
redshift probability densities in our group finding method, instead of using a fixed cutoff 
(based on some average redshift uncertainty) in photometric redshift space in determining 
group members. 

One of the main issues of having different photometric redshift uncertainties between 
red and blue galaxies is in estimating the true fraction of the galaxy populations. Because 
galaxies in groups populate regions of relatively higher number density compared with the 
field, more group galaxies are expected to scatter into the field than in the reverse direction 
due to their photometric redshift uncertainties. Therefore the estimated fraction of red(blue) 
galaxies in a group is expected to be smaller (larger) than its true value, due to the larger 
fraction of red galaxies in richer environments. To explore how significantly the true fraction 
is affected, we compute the fraction of red galaxies in each galaxy group. We define the red 
galaxies as galaxies of color redder than halfway of the B — R c color difference between E 
and Sc galaxies. For each recovered mock group, the true red galaxy fraction is computed 
simply by counting the number of red members to the total. For the matched pFoF groups 
in the 'z simu i ate d and 'blue-improved' samples, we estimate the red galaxy fraction using a 
Bayesian inference to consider the background contamination. We find that the estimated 
red galaxy fraction in the ' z simulated and 'blue-improved' samples are comparable to each 
other. However, the values are smaller, as expected, than the true values in the recovered 
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mock groups by ~ 13%. 

Another concern in using photometric redshift groups in scientific analyses is the con- 
tamination of false groups. In a real observational sample, it is difficult to distinguish false 
groups from the true groups. To estimate how the contamination of false groups affects 
galaxy population analyses, we compute the red galaxy fraction in each pFoF group of 
Ng a i > 8 for such richness cutoff. We find that the false groups have smaller red galaxy frac- 
tions compared with the true pFoF groups. The mean red galaxy fractions are ~ 0.75 and 
~ 0.28 for the true and false groups, respectively. Therefore, when computing the averaged 
red galaxy fractions of all pFoF groups in a sample, the value of the estimated red galaxy 
fraction can be biased by ~ 0.05 smaller, assuming that ~10% of the groups may be false 
detections. 



7.4. Examples of Recovered Groups 

Tests 1 to 7 allow us to conclude that our pFoF group finding algorithm is able to identify 
galaxy groups using photometric redshift samples, although the performance of group finding 
results depends on the accuracy of photometric redshift measurements. We summarize our 
test results using mock samples in Table [TJ In Fig rmfTSj we present two typical examples 
of the identified mock and pFoF groups obtained using P ra tio,crit = 0.37 and D0 xy = 0.25 
Mpc. In each figure, we show the sky locations of the mock group members. The members 
of the pFoF group which matches the mock group galaxies are marked by the crosses within 
a square. The simulated photometric redshift distribution of members in the mock group 
and the individual photometric redshift probability densities of the matched pFoF group 
members are presented in Fig. [T5] and Fig. [T3J 

Both of these figures show that the estimated pFoF group redshift probability density 
(the dotted curve) has a smaller width than the individual members. In fact the photometric 
redshift uncertainty of individual galaxies is a emp ~ 0.070, while the average estimated pFoF 
group redshift z grp uncertainty is Az grp ~ 0.017. This Az grp is the width of a pFoF group 
redshift probability density, and depends on the number of linked galaxies. However, there 
is an offset between the actual and the estimated group redshifts. In our L z S i mu i a te sample, 
we find that these two sets of redshift do not follow a correlation of unity slope. This effect 
is likely related to the systematics in the photometric redshift estimation for individual 
galaxies. Without taking such systematic offsets into consideration, the r.m.s. of the pFoF 
group redshifts compared with the true ones is ~ 0.044. The r.m.s. is reduced to ~ 0.020 
after correcting for such systematic offsets, and is in agreement with the estimated pFoF 
group redshift uncertainties Az grp . Therefore, this r.m.s. dispersion can be considered as 
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the internal uncertainty of the redshift estimation, which is directly comparable to Az grp . 

8. Testing pFoF Algorithm on CNOC2 Groups 

8.1. The Group Samples 

The CNOC2 group catalog was generated using a friends-of-friends algorithm with 
r max _ Q_25h~ 1 Mpc and r™ ax = 5h~ l Mpc as the lin king parameters in t he transverse and 



radial direction in a spectroscopic redshift sample (ICarlberg et al.l 120011 ). A total of 192 
groups in an area of 1.5 square degrees were identified at a median redshift of 0.33. The 
average number of galaxies identified in each group is iV ~ 4. The richness of CNOC2 
groups is computed as 7]cnoc2 = J2i 1 { w m,iW z ,i) w here w m , ti and w zA are the weights based 



on the magnitude and redshift selection functions (lYee et al.l 120001 ) . As a result, the group 
richness is ~ 1.74 times greater than the identified group members, i.e., the true average 
group richness is ~ 7. The four CNOC2 patches coincide with the RCS1 observations (Hsieh 
et al. 2005), but do not have complete overlap. We apply the sample selection and the 
pFoF algorithm with P r ati 0) crit = 0.37 and D0 xy = 0.25 Mpc to the RCS catalogs overlapping 
with the CNOC2 patches. Due to the incomplete coverages in the RCS, we have 109 of 
the published CNOC2 groups in our sample. We set a redshift cut as 0.19 < z < 0.41, 
since the redshift dis tribution of the CNOC2 groups becomes incomplete beyond z ~ 0.4 



( ICarlberg et al.l 1200 if ). 



8.2. The Group Finding Results 

We first check the pFoF performance on CNOC2 groups and subsequently we use pFoF 
groups to establish the completeness of the CNOC2 group sample. 

(1) Test I: the fraction of recovered CNOC2 groups 

To establish if a pFoF group recovers a CNOC2 group, we measure the separation 
between the CNOC2 and pFoF group centers. The reference CNOC2 groups are selected 
with the criterion t]cnoc2/N < 2.5 to remove highly incomplete groups. With this, we have 
65 reference CNOC2 groups. We define that a matched pFoF group must have its center 
within 0.25Mpc (the linking length used in Carlberg et al. 2001) to a CNOC2 group center, 
and satisfy N gai > 3. Fig. [T41 shows the recovery rate as a function of CNOC2 group richness 
Vcnoc2- The recovery rate is ~ 80% for the richness cutoff of r\cNOC2 > 5. 

(2) Test II: the completeness of CNOC2 groups 
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To examine the completeness of CN0C2 groups, the reference pFoF groups are selected 
as Ng a i > 3 and N gz > 5 in the same redshift range as the CNOC2 groups. We have 231 
pFoF groups satisfying these conditions. Also in this case we impose a maximum separation 
of 0.25Mpc between the pFoF and CNOC2 group centers. For the purpose of estimating the 
sampling rate of the CNOC2 groups, we plot in Fig. [TUthe ratio of matched reference pFoF 
groups to the total as a function of group richness N ga i. Fig. [TJ] shows that ~ 50% pFoF 
groups with N gai < 20 are matched with the CNOC2 groups. If we take the fraction of false 
pFoF groups to be ~ 10% based on the results of Test 2 in § 17.21 the result indicates that 
the completeness rate of the CNOC2 groups is ~ 56% for poor groups, which is similar to 
what was estimated (roughly 50%) in Carlberg et al. (2001). 



9. Summary 

We have presented a new group-finding algorithm, pFoF, for identifying galaxy groups 
using photometric redshift catalogs. We have tested our pFoF algorithm on both mock 
catalogs and the CNOC2 groups. We summarize the most relevant results below. 

Using the sample in which the simulated photometric redshifts resemble the real data, 
the comparisons between the pFoF and mock groups show that our algorithm produces rea- 
sonable results: (1) more than 80% of the mock groups with 1.2 x 1O 13 M halo mass are 
recovered, (2) the fraction of false groups is 10% for the groups of N gal > 7.91, and (3) ~ 5% 
of pFoF groups are flagged as 'serious projection' for which the pFoF group members are 
contributed by multiple mock groups. We find that the pFoF results strongly depend on 
the sample depth. The samples should be sufficiently deep (~ Mjj c + 1.5) into the lumi- 
nosity function for reliable group finding results. The use of samples with accurate redshift 
measurements reveals that the false detection rate depends strongly on the photometric red- 
shift measurement accuracy. Application of the pFoF algorithm to the RCS-CNOC2 patches 
shows good agreement for the CNOC2 groups with 0.19 < z < 0.41. 

The basic working principle of our pFoF algorithm is to improve the group redshift 
by joining new members. The average uncertainty in the estimated pFoF group redshift in 
our mock group tests is ~ 0.017, compared with the average uncertainty of 0.070 for the 
photometric redshifts of individual galaxies. While such group redshift uncertainty is still 
very large compared with groups spectroscopically identified, our results show that our pFoF 
algorithm reduces the photometric redshift uncertainties significantly. 

With our test results, we have demonstrated that our group-finding algorithm is able to 
identify galaxy groups with the capability of dealing with photometric redshift uncertainties. 
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The purpose of this paper is to provide a method for searching galaxy groups (and clusters) 
in photometric redshift data sets as the first in a series of papers. We will apply this pFoF 
algorithm to the CNOC1 and RCS data sets. These data sets will provide us with a large 
sample of galaxy groups at 0.2 < z < 0.6, and enable us to study environmental dependence 
of galaxy properties and their evolution. 
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Table 1: Mock Test Results 



Sample 


■^T&tiOjCvit 


D0 xy a 




recovery rate b 


false election rate c 


serious projection 11 


^SZTTlulcit&cl 


0.37 


0.25 


Mn +1.0 

JXc 


31% 


0% 


8% 


^ stTnulcitcd 


0.37 


0.25 


Mn + 1.5 

JXc 


67% 


0% 


5% 


£ ioty) ti 1 nfpfJ 

J LI 1 C LL I Li* L' i^L 


0.37 


0.25 


M k + 2 -0 


80% 


9% 


5% 


^ siTnul&tGd 


0.25 


0.15 


m£ + 2.0 


65% 


0% 


3% 


J LI 1 I' LL I l_i> L' O Li 


0.25 


0.20 


M k + 2 -0 


79% 


10% 


3% 


^simulated 


0.25 


0.25 


M k + 2-0 


80% 


16% 


8% 


^simulated 


0.25 


0.30 


M k + 2.0 


80% 


19% 


8% 


^simulated 


0.37 


0.15 


M k + 2-0 


61% 


0% 


5% 


^simulated 


0.37 


0.20 


M k + 2-0 


73% 


2% 


2% 


^simulated 


0.37 


0.25 


M k + 2-0 


80% 


9% 


5% 


^simulated 


0.37 


0.30 


M k + 2-0 


80% 


10% 


7% 


^simulated 


0.50 


0.15 


M k + 2-0 


55% 


5% 


1% 


^simulated 


0.50 


0.20 


M k + 2-0 


72% 


0% 


3% 


^simulated 


0.50 


0.25 


M k + 2-0 


76% 


2% 


2% 


^simulated 


0.50 


0.30 


M k + 2-0 


79% 


8% 


7% 


Gaussian 


0.37 


0.25 


M k + 2-0 


82% 


9% 


13% 


Zhalf 


0.37 


0.25 


M k + 2-0 


80% 


0% 


3% 


z-mimic 


0.37 


0.25 


M k + 2-0 


90% 


10% 


3% 


Zhaif-m.im.ic 


0.37 


0.25 


M k + 2-0 


89% 


3% 


3% 



a in Mpc 

6 for mock groups of M haXo > 1.2 x 10 13 M© 
c for pFoF groups of N gat > 8 
d for true pFoF groups on average 



14 16 18 20 22 24 26 

R c 



Fig. 1. — The training set galaxies are classified into 19 color-magnitude cells in our empirical 
photometric fitting method. The slopes for the two sets of parallel lines are -0.084 and -0.60 
to mimic the rough differentiation of different types of galaxies at various redshifts. 




Fig. 2. — The comparison between spectroscopic and photometric redshifts for 398 galaxies 
in BVRz' trained by 3590 training set galaxies quadratically. The dispersion in redshift 
difference is ~ 0.060 at < z < 1. 
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Fig. 3. — Left: The empirical photometric redshift uncertainties as a function of magnitude 
for the 398 control test galaxies. The filled circles represent red galaxies (B — R c > 1.8) 
and open ones symbolize blue galaxies (B — R c < 1.8). Right: Similar to the left but as a 
function of B — R c color. Filled circles are for bright galaxies (R c < 21.5) and open circles 
represent faint galaxies (R c > 21.5). 
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Fig. 4. — The results of Test 1 using pFoF groups obtained with P ra uo,crit — 0.37 and 
D0 xy = 0.25 Mpc. The recovery rates as a function of mock group halo mass are shown in 
the top panel. The three curves represent the results using three different sample depths 
as indicated in the panel. The distribution of the reference mock group halo mass with 
the + 2.0 cutoff is shown as the un-hatched histogram in the bottom panel, and the 
recovered reference groups are presented as the hatched histogram. 
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Fig. 5. — Top: The fractions of false pFoF groups as a function of group richness for three 
different sample depths as indicated in the panels. Middle: The fraction of pFoF groups 
flagged as 'serious projection', which are pFoF groups containing members from multiple 
mock groups. Bottom: The unhatched histogram is the richness distribution for the reference 
pFoF groups in the sample of M^ c + 2.0 depth. The number of false and 'serious projection' 
pFoF groups are presented as the hatched histograms. 




Fig. 6. — The recovery rate (Test 1) for pFoF groups obtained using varying P ra tio,crit and 
D0 xy . The left panels plot the results for various D0 xy for each fixed P ra uo,crit- The right 
panels show the results using fixed D0 xy with varying P ra tio,crit- 
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Fig. 7. — Left: The false detection rates (Test 2) using various D0 xy with a fixed P ra tio,crit- 
Right: The fractions of 'serious projection' (Test 2) using the same set of D0 xy and P ra tio,crit- 




Fig. 8. — Same as Fig. [7] but keeping D0 xy fixed and varying P ra tio,crit- 
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Fig. 9. — Le/i: The repeated Test 1 results (recovery rate) using the 'z simu i ate d and 'Gaussian' 
samples (see Test 5 in §7.21) . Right: The false detection rates (top) and the fractions of the 
pFoF groups flagged as 'serious projection' (bottom) using the same two samples in the left 
plot. 
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Fig. 10. — Left: The repeated Test 1 results using the l z simu i ated \ '^/','z-mimic', and l z ha i f - 
mimic' samples (see Test 6 and Test 7 in §7.21) . Right: The false detection rates (top) and 
fractions of 'serious projection' (bottom) using the same four samples in the left plot. 
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Fig. 11.— The sky map of a rich mock group 0=0.237, M = 2.0 x 1O 14 M ) and the pFoF 
groups in the same sky region. The solid dots are galaxies in the l z S i mv i a ted sample. The 
squares mark the position of each member of the mock group in the sample. The crosses and 
triangles indicate the members of two pFoF groups in this region, selected with N ga i > 10 
and z p fof < z cut- The mock group is matched by the pFoF group plotted in crosses. Note the 
other pFoF group (triangles) is completely separated from the matched one, and is identified 
with another mock group at z — 0.126, demonstrating the ability of the pFoF algorithm to 
separate groups at different redshifts. 
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Fig. 12. — Top: The histogram (0.01 bin size) of simulated photometric redshift of galaxies 
in the mock group (open histogram) and members of the matched pFoF group (hatched 
histogram). The vertical dotted line indicates the mock group redshift. Bottom: The indi- 
vidual photometric redshift probability distributions of the matched pFoF group members 
(i.e., the pFoF group galaxies in crosses in Fig. fTTI ) are plotted as solid curves, and galax- 
ies which belong to the pFoF group, but not in this mock group halo (i.e., galaxies in the 
pFoF group which are projected back/foreground galaxies) are plotted in dashed curves. 
The group redshift distribution of this matched pFoF group is plotted as the dotted curve, 
ZpFoF = 0.217 ±0.009. 
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Fig. 13. — same as Fig. [TT] and Fig. [12] but for a poorer mock group at 2=0.327 and 
of 1.9 x 1O 13 M . The matched pFoF group has the richness of N ga i = 5.30 at z p f f = 
0.303 ±0.018. 
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Fig. 14. — Left: The recovery rate of pFoF as a function of CNOC2 group richness t)cnoc'2- 
Right: The fraction of matched reference pFoF groups to the total as a function of group 
richness N nrr) . 



